AITopics | language ideology

Collaborating Authors

language ideology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs

Watson, Julia, Lee, Sophia, Beekhuizen, Barend, Stevenson, Suzanne

arXiv.org Artificial IntelligenceSep-20-2024

We study language ideologies in text produced by LLMs through a case study on English gendered language reform (related to role nouns like congressperson/-woman/-man, and singular they). First, we find political bias: when asked to use language that is "correct" or "natural", LLMs use language most similarly to when asked to align with conservative (vs. progressive) values. This shows how LLMs' metalinguistic preferences can implicitly communicate the language ideologies of a particular political group, even in seemingly non-political contexts. Second, we find LLMs exhibit internal inconsistency: LLMs use gender-neutral variants more often when more explicit metalinguistic context is provided. This shows how the language ideologies expressed in text produced by LLMs can vary, which may be unexpected to users. We discuss the broader implications of these findings for value alignment.

noun, pronoun, role noun, (14 more...)

arXiv.org Artificial Intelligence

2409.13852

Country:

North America > Canada > Ontario > Toronto (0.29)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Singapore (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Government (0.67)
Media > News (0.46)
Leisure & Entertainment (0.46)
Law Enforcement & Public Safety (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Standard Language Ideology in AI-Generated Language

Smith, Genevieve, Fleisig, Eve, Bossi, Madeline, Rustagi, Ishita, Yin, Xavier

arXiv.org Artificial IntelligenceJun-12-2024

In this position paper, we explore standard language ideology in language generated by large language models (LLMs). First, we outline how standard language ideology is reflected and reinforced in LLMs. We then present a taxonomy of open problems regarding standard language ideology in AI-generated language with implications for minoritized language communities. We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language. Finally, we discuss tensions that remain, including reflecting on what desirable system behavior looks like, as well as advantages and drawbacks of generative AI tools imitating--or often not--different English language varieties. Throughout, we discuss standard language ideology as a manifestation of existing global power structures in and through AI-generated language before ending with questions to move towards alternative, more emancipatory digital futures.

language ideology, language variety, minoritized variety, (12 more...)

arXiv.org Artificial Intelligence

2406.08726

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > Alameda County > Berkeley (0.05)
Europe > United Kingdom > England > Greater London > London (0.05)
(14 more...)

Genre: Research Report (0.50)

Industry:

Media (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

Gururangan, Suchin, Card, Dallas, Dreier, Sarah K., Gade, Emily K., Wang, Leroy Z., Wang, Zeyu, Zettlemoyer, Luke, Smith, Noah A.

arXiv.org Artificial IntelligenceJan-26-2022

Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and newswire often serve as anchors for automatically selecting web text most suitable for language modeling, a process typically referred to as quality filtering. Using a new dataset of U.S. high school newspaper articles -- written by students from across the country -- we investigate whose language is preferred by the quality filter used for GPT-3. We find that newspapers from larger schools, located in wealthier, educated, and urban ZIP codes are more likely to be classified as high quality. We then demonstrate that the filter's measurement of quality is unaligned with other sensible metrics, such as factuality or literary acclaim. We argue that privileging any corpus as high quality entails a language ideology, and more care is needed to construct training corpora for language models, with better transparency and justification for the inclusion or exclusion of various texts.

high quality, quality filter, quality score, (16 more...)

arXiv.org Artificial Intelligence

2201.10474

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)
Research Report > Experimental Study (0.69)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback